knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
library(tidyverse)
library(here)
library(broom)

# Spatial data packages
library(sf)
library(tmap)

Part 0: Lab set-up

Part 1: Spatial data wrangling, visualization, and a variogram

In the Week 7 lecture, we learned a bit more about projection and coordinate reference systems, types of spatial data, and investigating spatial autocorrelation using variograms. In this first part of lab we’ll practice working with spatial data, then move on to variograms and spatial interpolation in the next part. We’ll look at point pattern analysis (exploring spatial clustering) next week.

Today, we’ll use vector data (polygons, points) to practice reading in spatial data, checking & updating the CRS, and doing some wrangling and visualization.

We’ll use several datasets:

A. California county outlines (polygons)

Read it in with read_sf

First, let’s read in the California county shapefile:

ca_counties_sf <- read_sf(here("data", "ca_counties", "CA_Counties_TIGER2016.shp"))

Do a bit of wrangling (and see sticky geometry!)

Use View(ca_counties) to check out what it contains. Let’s simplify it by only keeping two attributes: NAME (county name) and ALAND (land area), then renaming those to county_name and land_area.

ca_subset_sf <- ca_counties_sf %>% 
  janitor::clean_names() %>%
  select(county_name = name, land_area = aland)

head(ca_subset_sf) ### WARN AGAINST View()
## Simple feature collection with 6 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -13565690 ymin: 3862173 xmax: -13096340 ymax: 4833572
## Projected CRS: WGS 84 / Pseudo-Mercator
## # A tibble: 6 × 3
##   county_name     land_area                                             geometry
##   <chr>               <dbl>                                   <MULTIPOLYGON [m]>
## 1 Sierra         2468694587 (((-13431320 4821511, -13431313 4821520, -13431300 …
## 2 Sacramento     2499183617 (((-13490651 4680832, -13490511 4680817, -13490465 …
## 3 Santa Barbara  7084000598 (((-13423117 4042044, -13423158 4043249, -13422800 …
## 4 Calaveras      2641820834 (((-13428575 4627725, -13428535 4627889, -13428535 …
## 5 Ventura        4773390489 (((-13317854 3931602, -13317828 3932624, -13317686 …
## 6 Los Angeles   10510651024 (((-13210018 3958856, -13210085 3959984, -13209920 …

Take a look at ca_subset_sf. We should notice something very important about a simple features (sf) object: it just assumes you want to keep the spatial information, and you can work with the rest of the data as if it’s a non-spatial data frame (and the spatial information just “sticks” - hence the term “sticky geometry”). So even though we only called (and renamed) name and aland in the select() function, we see that the geometry column still exists!

What if we wanted just the dataframe, without the geometry? Convert to dataframe and select out the geometry column:

ca_counties_df <- ca_counties_sf %>%
  as.data.frame() %>%
  select(-geometry)

Check and set the CRS

Use st_crs() to check the existing CRS for spatial data. We see that this CRS is “pseudo-mercator” based on WGS 84 - primarily used for web mapping, not analysis. WGS84 (epsg:3857), also note proj4 string and WKT definitions.

ca_subset_sf %>% st_crs()
## Coordinate Reference System:
##   User input: WGS 84 / Pseudo-Mercator 
##   wkt:
## PROJCRS["WGS 84 / Pseudo-Mercator",
##     BASEGEOGCRS["WGS 84",
##         ENSEMBLE["World Geodetic System 1984 ensemble",
##             MEMBER["World Geodetic System 1984 (Transit)"],
##             MEMBER["World Geodetic System 1984 (G730)"],
##             MEMBER["World Geodetic System 1984 (G873)"],
##             MEMBER["World Geodetic System 1984 (G1150)"],
##             MEMBER["World Geodetic System 1984 (G1674)"],
##             MEMBER["World Geodetic System 1984 (G1762)"],
##             MEMBER["World Geodetic System 1984 (G2139)"],
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]],
##             ENSEMBLEACCURACY[2.0]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4326]],
##     CONVERSION["Popular Visualisation Pseudo-Mercator",
##         METHOD["Popular Visualisation Pseudo Mercator",
##             ID["EPSG",1024]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["False easting",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["easting (X)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["northing (Y)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["Web mapping and visualisation."],
##         AREA["World between 85.06°S and 85.06°N."],
##         BBOX[-85.06,-180,85.06,180]],
##     ID["EPSG",3857]]
ca_subset_sf %>% raster::crs() ### to show proj4 string
## [1] "PROJCRS[\"WGS 84 / Pseudo-Mercator\",\n    BASEGEOGCRS[\"WGS 84\",\n        DATUM[\"World Geodetic System 1984\",\n            ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n                LENGTHUNIT[\"metre\",1]]],\n        PRIMEM[\"Greenwich\",0,\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n        ID[\"EPSG\",4326]],\n    CONVERSION[\"unnamed\",\n        METHOD[\"Popular Visualisation Pseudo Mercator\",\n            ID[\"EPSG\",1024]],\n        PARAMETER[\"Latitude of natural origin\",0,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8801]],\n        PARAMETER[\"Longitude of natural origin\",0,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8802]],\n        PARAMETER[\"False easting\",0,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8806]],\n        PARAMETER[\"False northing\",0,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8807]]],\n    CS[Cartesian,2],\n        AXIS[\"easting (X)\",east,\n            ORDER[1],\n            LENGTHUNIT[\"metre\",1]],\n        AXIS[\"northing (Y)\",north,\n            ORDER[2],\n            LENGTHUNIT[\"metre\",1]],\n    USAGE[\n        SCOPE[\"Web mapping and visualisation.\"],\n        AREA[\"World between 85.06°S and 85.06°N.\"],\n        BBOX[-85.06,-180,85.06,180]],\n    ID[\"EPSG\",3857]]"

Look at it

Plot the California counties using geom_sf(). Notice that we can update aesthetics just like we would for a regular ggplot object. Here, we update the color based on land area (and change the color gradient).

ggplot(data = ca_subset_sf) +
  geom_sf(aes(fill = land_area), color = "white", size = 0.1) +
  theme_void() +
  scale_fill_gradientn(colors = c("cyan","blue","purple"))

Notice what aesthetics we didn’t have to specify here?

geom_sf knows to look for a column called geometry (or sometimes geom).

B. Invasive red sesbania records (spatial points)

Red sesbania (Sesbania punicea) is an invasive plant (see more information from the California Invasive Plants Council). Observations for locations of invasive red sesbania are from CA DFW. See metadata and information here: https://map.dfg.ca.gov/metadata/ds0080.html

The data exist in data/red_sesbania, and the shapefile is stored as ds80.shp. Let’s read in the data:

sesbania_sf <- read_sf(here("data","red_sesbania","ds80.shp")) %>%
  janitor::clean_names()

# Check the CRS:
sesbania_sf %>% st_crs()
## Coordinate Reference System:
##   User input: Custom 
##   wkt:
## PROJCRS["Custom",
##     BASEGEOGCRS["NAD83",
##         DATUM["North American Datum 1983",
##             ELLIPSOID["GRS 1980",6378137,298.257222101,
##                 LENGTHUNIT["metre",1]],
##             ID["EPSG",6269]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["Degree",0.0174532925199433]]],
##     CONVERSION["unnamed",
##         METHOD["Albers Equal Area",
##             ID["EPSG",9822]],
##         PARAMETER["Latitude of false origin",0,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8821]],
##         PARAMETER["Longitude of false origin",-120,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8822]],
##         PARAMETER["Latitude of 1st standard parallel",34,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8823]],
##         PARAMETER["Latitude of 2nd standard parallel",40.5,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8824]],
##         PARAMETER["Easting at false origin",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8826]],
##         PARAMETER["Northing at false origin",-4000000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8827]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]]
sesbania_sf %>% raster::crs()
## [1] "PROJCRS[\"Custom\",\n    BASEGEOGCRS[\"NAD83\",\n        DATUM[\"North American Datum 1983\",\n            ELLIPSOID[\"GRS 1980\",6378137,298.257222101,\n                LENGTHUNIT[\"metre\",1]],\n            ID[\"EPSG\",6269]],\n        PRIMEM[\"Greenwich\",0,\n            ANGLEUNIT[\"Degree\",0.0174532925199433]]],\n    CONVERSION[\"unnamed\",\n        METHOD[\"Albers Equal Area\",\n            ID[\"EPSG\",9822]],\n        PARAMETER[\"Latitude of false origin\",0,\n            ANGLEUNIT[\"Degree\",0.0174532925199433],\n            ID[\"EPSG\",8821]],\n        PARAMETER[\"Longitude of false origin\",-120,\n            ANGLEUNIT[\"Degree\",0.0174532925199433],\n            ID[\"EPSG\",8822]],\n        PARAMETER[\"Latitude of 1st standard parallel\",34,\n            ANGLEUNIT[\"Degree\",0.0174532925199433],\n            ID[\"EPSG\",8823]],\n        PARAMETER[\"Latitude of 2nd standard parallel\",40.5,\n            ANGLEUNIT[\"Degree\",0.0174532925199433],\n            ID[\"EPSG\",8824]],\n        PARAMETER[\"Easting at false origin\",0,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8826]],\n        PARAMETER[\"Northing at false origin\",-4000000,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8827]]],\n    CS[Cartesian,2],\n        AXIS[\"(E)\",east,\n            ORDER[1],\n            LENGTHUNIT[\"metre\",1,\n                ID[\"EPSG\",9001]]],\n        AXIS[\"(N)\",north,\n            ORDER[2],\n            LENGTHUNIT[\"metre\",1,\n                ID[\"EPSG\",9001]]]]"

Notice that this CRS is different from the California counties CRS, so we’ll want to update it to match. Use st_transform() to update the CRS:

### if you know the EPSG code:
sesbania_3857_sf <- st_transform(sesbania_sf, 3857)
### if you don't know the EPSG code:
sesbania_3857_2_sf <- st_transform(sesbania_sf, st_crs(ca_counties_sf))

# Then check it: 
sesbania_3857_sf %>% st_crs()
## Coordinate Reference System:
##   User input: EPSG:3857 
##   wkt:
## PROJCRS["WGS 84 / Pseudo-Mercator",
##     BASEGEOGCRS["WGS 84",
##         ENSEMBLE["World Geodetic System 1984 ensemble",
##             MEMBER["World Geodetic System 1984 (Transit)"],
##             MEMBER["World Geodetic System 1984 (G730)"],
##             MEMBER["World Geodetic System 1984 (G873)"],
##             MEMBER["World Geodetic System 1984 (G1150)"],
##             MEMBER["World Geodetic System 1984 (G1674)"],
##             MEMBER["World Geodetic System 1984 (G1762)"],
##             MEMBER["World Geodetic System 1984 (G2139)"],
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]],
##             ENSEMBLEACCURACY[2.0]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4326]],
##     CONVERSION["Popular Visualisation Pseudo-Mercator",
##         METHOD["Popular Visualisation Pseudo Mercator",
##             ID["EPSG",1024]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["False easting",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["easting (X)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["northing (Y)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["Web mapping and visualisation."],
##         AREA["World between 85.06°S and 85.06°N."],
##         BBOX[-85.06,-180,85.06,180]],
##     ID["EPSG",3857]]

Cool, now they have the same CRS.

Plot them together!

Note: this may take a minute. Remember, later geoms go on top.

ggplot() +
  geom_sf(data = ca_subset_sf) +
  geom_sf(data = sesbania_3857_sf, size = 1, color = "red")

A bit of wrangling!

Let’s say we want to find the count of red sesbania observed locations in this dataset by county. How can I go about joining these data so that I can find counts? Don’t worry…st_join() has you covered for spatial joins!

ca_sesb_sf <- ca_subset_sf %>% 
  st_join(sesbania_3857_sf)

head(ca_sesb_sf)
## Simple feature collection with 6 features and 13 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -13565690 ymin: 4582029 xmax: -13358430 ymax: 4833572
## Projected CRS: WGS 84 / Pseudo-Mercator
## # A tibble: 6 × 14
##   county_name land_…¹                  geometry    id info_…² last_updat prese…³
##   <chr>         <dbl>        <MULTIPOLYGON [m]> <dbl> <chr>   <date>     <chr>  
## 1 Sierra       2.47e9 (((-13431320 4821511, -1…    NA <NA>    NA         <NA>   
## 2 Sacramento   2.50e9 (((-13490651 4680832, -1…     6 Steven… 2002-10-07 Y      
## 3 Sacramento   2.50e9 (((-13490651 4680832, -1…    24 Robin … 2003-09-17 Y      
## 4 Sacramento   2.50e9 (((-13490651 4680832, -1…    25 Robin … 2003-09-17 Y      
## 5 Sacramento   2.50e9 (((-13490651 4680832, -1…    29 Robin … 2003-09-17 Y      
## 6 Sacramento   2.50e9 (((-13490651 4680832, -1…    30 Robin … 2003-09-17 Y      
## # … with 7 more variables: erad_work <chr>, location <chr>, county <chr>,
## #   watershed <chr>, latitude <dbl>, longitude <dbl>, notes <chr>, and
## #   abbreviated variable names ¹​land_area, ²​info_sourc, ³​presence

And then we can find counts (note: these are not counts for individual plants, but by record in the dataset) by county. We can’t just count the rows (e.g., using count()) because some rows are counties with no records (and sesbania information is all NAs)

sesb_counts_sf <- ca_sesb_sf %>% 
  group_by(county_name) %>%
  summarize(n_records = sum(!is.na(id)))

head(sesb_counts_sf)
## Simple feature collection with 6 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -13668380 ymin: 4502606 xmax: -13307390 ymax: 4888059
## Projected CRS: WGS 84 / Pseudo-Mercator
## # A tibble: 6 × 3
##   county_name n_records                                                 geometry
##   <chr>           <int>                                       <MULTIPOLYGON [m]>
## 1 Alameda             1 (((-13612247 4538150, -13612347 4538291, -13612423 4538…
## 2 Alpine              0 (((-13366504 4678946, -13366493 4678954, -13366468 4678…
## 3 Amador              0 (((-13472698 4647652, -13472698 4647677, -13472698 4647…
## 4 Butte               5 (((-13565005 4798394, -13564992 4798529, -13565000 4798…
## 5 Calaveras           0 (((-13428575 4627725, -13428535 4627889, -13428535 4628…
## 6 Colusa              0 (((-13589905 4781178, -13589881 4781178, -13589804 4781…

Then we can plot a choropleth using the number of records for red sesbania as the fill color (instead of what we used previously, land area):

ggplot(data = sesb_counts_sf) +
  geom_sf(aes(fill = n_records), color = "white", size = 0.1) +
  scale_fill_gradientn(colors = c("lightgray","orange","red")) +
  theme_minimal() +
  labs(fill = "Number of S. punicea records")

So we see that we can still use our usual wrangling skills! Let’s do a bit more for fun, just to prove that our existing wrangling skills still work with spatial data - the spatial information just sticks to it! Only plot the county with the greatest number of red sesbania records (Solano), and make a map of those locations (yeah there are many ways to do this):

# Subset of sesbania point locations only in Solano County
solano_sesb_sf <- sesbania_sf %>% 
  filter(county == "Solano")

# Only keep Solano polygon from California County data
solano_sf <- ca_subset_sf %>% 
  filter(county_name == "Solano")

ggplot() +
  geom_sf(data = solano_sf) +
  geom_sf(data = solano_sesb_sf, color = 'red')

C. Making an interactive map with {tmap}

Sometimes we’ll want to make a map interactive so that audience members can zoom in, explore different areas, etc. We can use the {tmap} package to create an interactive map. Let’s make one for our California counties (fill aesthetic by land area) with the red sesbania locations on top:

# Set the viewing mode to "interactive":
tmap_mode(mode = "view")

# Then make a map (with the polygon fill color updated by variable 'land_area', updating the color palette to "BuGn"), then add another shape layer for the sesbania records (added as dots):
tm_shape(ca_subset_sf) +
  tm_fill("land_area", palette = "BuGn") +
  tm_shape(sesbania_sf) +
  tm_dots()

See all kinds of other cool ways you can update your interactive tmaps.

See: